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^ ■ Generalized linear statistics are an unifying class that contains [/-statistics 

[/-quantiles, L-statistics as well as trimmed and winsorized [/-statistics. For 
example, many commonly used estimators of scale fall into this class. GL- 
statistics only have been studied under independence; in this paper, we de- 
velop an asymptotic theory for GL-statistics of sequences which are strongly 
mixing or L 1 near epoch dependent on an absolutely regular process. For this 



CO . purpose, we prove an almost sure approximation of the empirical [/-process 

by a Gaussian process. With the help of a generalized Bahadur represen- 
tation, it follows that such a strong invariance principle also holds for the 
empirical [/-quantile process and consequently for GL-statistics. We obtain 
central limit theorems and laws of the iterated logarithm for [/-processes, 

[/-quantile processes and GL-statistics as straightforward corollaries. 

■ 

1 Introduction 



17-Statistics and the Empirical ^/-Process 

In the whole paper, (X n ) ngIN shall be a stationary, real valued sequence of random 
variables. A [/-statistic U n (g) can be described as generalized mean, i.e. the mean of 
the values g(Xi,Xj), 1 < % < j < n, where g is a bivariate, symmetric and measurable 
kernel. The following two estimators of scale are [/-statistics: 
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1 Introduction 



Example 1.1. Consider g (x, y) = \ ix — y) 2 . A short calculation shows that the related 
U-statistic is the well-known variance estimator 



^(ff) = ^iE (x.-x) 2 . 



Ki<n 



Example 1.2. Let g(x,y) 



x — y \ . Then the corresponding [/-statistic is 



u n (g) 



2 



n{n — 1) 



l<i<j<n 



known as Gini's mean difference. 

For [/-statistics of independent random variables, the central limit theorem (CLT) goes 
back to Hoeffding [22J and was extended to absolutely regular sequences by Yoshihara 
[37] . to near epoch dependent sequences on absolutely regular processes by Denker and 
Keller [19] and to strongly mixing random variables by Dehling and Wendler [T7]. The 
law of the iterated logarithm (LIL) under independence was proved by Serfling [32j and 
was extended to strongly mixing and near epoch dependent sequences by Dehling and 
Wendler [TS]. 

Not only [/-statistics with fixed kernel g are of interest, but also the empirical U- 
distribution function {U n (t)) t£R , which is for fixed t a [/-statistic with kernel h(x, y, t) := 
^-{ g (x,y)<t}- The Grassberger-Procaccia and the Takens estimator of the correlation di- 
mension in a dynamical system are based on the empirical [/-distribution function, see 
Borovkova et al. |12| . 

The functional CLT for the empirical [/-distribution function has been established by 
Arcones and Gine [5] for independent data, by Arcones and Yu for absolutely regular data 
[7], and by Borovkova et al. [12] for data, which is near epoch dependent on absolutely 
regular processes. The functional LIL for the empirical [/-distribution function has been 
proved by Arcones [2], Arcones and Gine [5] under independence. The Strong invariance 
principle has been investigated by Dehling et al. [16j. We will show a strong invariance 
principle under dependence. As a corollary, we will obtain the LIL to sequences which 
are strongly mixing or L 1 near epoch dependent on an absolutely regular process and 
the CLT under conditions which are slightly different from the conditions in Borovkova 
et al. [12] . Let us now proceed with precise definitions: 

Definition 1.3. We call a measurable function /i:RxRxR->R, which is symmetric 
in the first two arguments a kernel function. For fixed t £ H, we call 



the U -statistic with kernel h(-,-,t) and the process (U n {t)) teR the empirical U -distribution 
function. We define the U -distribution function as U (t) := E [h (X,Y,t)] ; where X, 
Y are independent with the same distribution as X\, and the empirical U -process as 




(^E(U n (t)-U(t))) 
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1 Introduction 



The main tool for the investigation of [/-statistics is the Hoeffding decomposition into 
a linear and a so-called degenerate part: 

U n (t) = U(t) + - V h 1 (X i ,t)+ . 2 V h 2 {X h X h t) 
n ^ n(n — l) ' 



n (n — 1) 

\<i<n v y l<i<j<n 



where 



h(x,t) := Eh(x,Y,t) - U (t) 
h 2 {x, y, t) := h(x, y, t) - hi(x, t) - hi(y, t) - U (t) . 

We need some technical assumptions to guarantee the convergence of the empirical 
[/-process: 

Assumption 1. The kernel function h is bounded and non- decreasing in the third argu- 
ment. The U -distribution function U is continuous. For all x, y G R: lim^oo h(x, y, t) = 
1, lim^oo h(x, y, t) = 0. 

Furthermore, we will consider dependent random variables, so we need an additional 
continuity property of the kernel function (which was introduced by Denker and Keller 



Assumption 2. h satisfies the uniform variation condition, that means there is a con- 
stant L, such that for all t G R, e > 



E 



sup \h (x, y, t) — h (X, Y, t)\ 

\\(x,y)-(X,Y)\\<e 



< Le, 



where X, Y are independent with the same distribution as Xi and ||-|| denotes the Eu- 
clidean norm. 



Empirical [/-Quantiles and GL-Statistics 

For p G (0,1), the p-th [7-quantile t p = U^ 1 {p) is the inverse of the [/-distribution 
function U at point p (in general, U does not have to be invertible, but this is guaranteed 
by our Assumption [3] at least in the interval I introduced in Theorem [2]). A natural 
estimator of a [7-quantile is the empirical [7-quantile U^ 1 {p) ) which is the generalized 
inverse of the empirical [/-distribution function at point p: 

Definition 1.4. Let p G (0, 1) and let U n be the empirical U - distribution function. 

U-\p):=mi{t\U n (t)>p} 
is called the empirical U -quantile. 

Empirical [/-quantiles have applications in robust statistics. 
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1 Introduction 



Example 1.5. Let h(x,y,t) := ^-{\x-y\<t}- Then the 0.25-[/-quantile is the Q n estimator 
of scale proposed by Rousseeuw and Croux |31| . which is highly robust, as its breakdown 
point is 50%. 

The kernel function h(x,y,t) := ^{\x-y\<t} satisfies Assumption [2] (uniform variation 
condition), if the [/-distribution function is Lipschitz continuous. For every e > 



E 



SU P P{|a:-3/|<f} ~~ l{|X-Y]<t} 

(x,y)-(X,Y)\\<e 



< P 



t - V2e < \X - Y\ < t + V2e] < U(t + V2e) - U(t - V2e) < Ce. 



The empirical [/-quantile and the empirical [/-distribution function have a converse 
behaviour: U~ l (p) is greater than t p iff U n (t p ) is smaller than p. This motivates a 
generalized Bahadur representation |1U] : 

where u = U' is the derivative of the [/-distribution function. For independent data 
and fixed p, Geertsema [20] established a generalized Bahadur representation with 

Rn(p) = o(^n~i\ogn^ a.s.. Dehling et al. [16] and Choudhury and Serfling [14] 

improved the rate to R n {p) = O ^n _ t(logn)tj. Arcones [4] proved the exact order 

Rn{p) — O (n~ 2 (log log n) f J as for sample quantiles. Under strong mixing and near 
epoch dependence on an absolutely regular processes, we recently established rates of 
convergence for R n {p) which depend on the decrease of the mixing coefficients [31] ■ The 
CLT and the LIL for XJ~ X (p) are straightforward corollaries of the convergence of R n 
and the corresponding theorems for U n (t p ). 

In this paper, we will study not a single [/-quantile, but the empirical [/-quantile 
process ([/ r ^ 1 (p))„ g / under dependence, where the interval / is given by / = [Ci,C2] 
with U(C\) < C\ < C2 < [/(C2) and the constants C\, C2 from Assumption [3] below. 
In order to do this, we will examine the rate of convergence of sup pg T Rn(p) and use 
the approximation of the empirical [/-process by a Gaussian process. As we divide by 
u in the Bahadur representation, we have to assume that this derivative behaves nicely. 
Furthermore, we need U to be a bit more than differentiable (but twice different iable is 
not needed). 

Assumption 3. U differentiable on an interval [61,62] with < inf tg [c^cy u(t) < 
su Pte[d,c 2 ] u (t) < 00 ( u (t) = U'(t)) and 



sup I U(t) - U{t') - uit) (t - t') I = O ( xi) 



t,t'E[Ci,c 2 y. \t-t'\<x 
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1 Introduction 



The Bahadur representation for sample quantile process goes back to Kiefer |23] under 
independence, Babu and Singh [9] proved such a representation for mixing data and 
Kulik [26J and Wu [35J for linear processes, but there seem to be no such results for the 
[/-quantile process. 

Furthermore, we are interested in linear functionals of the [/-quantile process. 

Definition 1.6. Let p±, . . . ,pa € I , b±, . . . , b^ e R and let J be a bounded function, that 
is continuous a.e. and vanishes outside of I. We call a statistic of the form 

P d 

T n = T (U- 1 ) := J (p) U-\p)dp + hUn\Pj) 

3=1 

n(n — 1) 

8=1 J n(n-l) V V 77 j = l 

generalized linear statistic (GL- statistic). 

This generalization of L-statistics was introduced by Serfling [33J. [/-statistics, U- 
quantiles and L-statistics can be written as GL-statistics (though this might be some- 
what artificially). For a [/-statistics, just take h(x,y,t) = ^{ g (x,y)<t} and J = 1 (this 
only works if we can consider the interval / = [0, 1]). The following example shows how 
to deal with an ordinary L-statistic. 

Example 1.7. Let h(x,y,t) := \ (t{ x <t} + ^{y<t}), Pi = 0.25, p 2 = 0.75, h = -1, 
62 = 1, and J = 0. Then a short calculation shows that the related GL-statistic is 

T n = F" 1 (0.75)-F- 1 (0.25), 

where F' 1 denotes the empirical sample quantile function. This is the well-known inter 
quartile distance, a robust estimator of scale with 25% breakdown point. 

Example 1.8. Let h(x, y, t) := lr i {x _ y) 2< t y pi = 0.75, h = 0.25 and J(x) = l{ x e[o,o.75]}- 
The related GL-statistic is called winsorized variance, a robust estimator of scale with 
13% breakdown point. 

The uniform variation condition also holds in this case, as h(x,y,t) = l/i^ a ._^2<A — 
l|| a ._ 2/ |< v ^| and this is the kernel function of Example 11.51 

Dependent Sequences of Random Variables 

While the theory of GL-statistics under independence has been studied by Serfling |33j , 
there seems to be no results under dependence. But many dependent random sequences 
are very common in applications. Strong mixing and near epoch dependence are widely 
used concepts to describe short range dependence. 
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Definition 1.9. Let (X n ) n( - m be a stationary process. Then the strong mixing coefficient 
is given by 

a(k) = sup { \P(A HB)- P(A)P(B)\ : A G J^,B G P™ +k , n G IN} , 

where JF a is the a -field generated by random variables X a , . . . , X\., and {X n ) n( - m is called 
strongly mixing, if a(k) — > as k — > oo. 

Strong mixing in the sense of a-mixing is the weakest of the well-known strong mixing 
conditions, see Bradley [13J. But this class of weak dependent processes is too strong for 
many applications, as it excludes examples like linear processes with innovations that 
do not have a density or data from dynamical systems, see Andrews pQ. 

We will consider sequences which are near epoch dependent on absolutely regular 
processes, as this class covers linear processes and data from dynamical systems, which 
are deterministic except for the initial value. Let T : [0, 1] — > [0, 1] be a piecewise 
smooth and expanding map such that inf xg [ 0j i] |T"(x)| > 1. Then there is a stationary 
process (X n ) ng]N such that X n+ i = T (X n ) which can be represented as a functional of 
an absolutely regular process, for details see Hofbauer and Keller [23]. Linear processes 
(even with discrete innovations) and GARCH processes are also near epoch dependent, 
see Hansen [2 1J . Near epoch dependent random variables are also called approximating 
functionals (for example in Borovkova et al. [1 2j ) 

Definition 1.10. Let (X n ) nelN be a stationary process. 

1. The absolute regularity coefficient is given by 

(3(k) = sup^su P {|P(A|J^J _ P(A)\ : A e JT-J, 

neJN 

and (X n ) nelN is called absolutely regular, if (3(k) — > as k — > oo. 

2. We say that (X n ) ne¥f is L 1 near epoch dependent on a process (Z n ) ne % with ap- 
proximation constants (ai)^, if 

E\Xi - E(X 1 \Q l _ l )\ < ai 1 = 0,1,2... 

where lim^oo a; = and Q l _ { is the a -field generated by Z_i, . . . , Zi. 

In the literature one often finds L 2 near epoch dependence (where the L 1 norm in 
the second part of definition 11.101 is replaced by the L 2 norm), but this requires second 
moments and we are interested in robust estimation. So we want to allow heavier 
tails and consider L 1 near epoch dependence. Furthermore, we do not require that the 
underlying process is independent, it only has to be weakly dependent in the sense of 
absolute regularity. 

Assumption 4. Let one of the following two conditions hold: 

1. (X n ) ngM is strongly mixing with mixing coefficients a(n) = 0(n~ a ) for a > 8 and 
E\Xi\ r < oo for a r > |. 

2. (A„) ngM is near epoch dependent on an absolutely regular process with mixing co- 
efficients f3(n) = 0(n~P) for f3 > 8 with approximation constants a(n) = 0(n~ a ) 
for a = max {(3 + 3, 12}. 
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2 Main Results 



Kiefer-Miiller processes 

For uniformly on [0, 1] distributed and independent random variables (X n ) n£ ^, Miiller 
[29] determined the limit distribution of the empirical process 



It converges weakly towards a Gaussian process (K(t, s)) s ^[o,i] with covariance func- 
tion EK(t, s)K(t' , s') = min{s, s'}(min{t, t'} — tt'). Kiefer [25j proved an almost sure 
invariance principle: After enlarging the probability space, there exists a copy of the 
Kiefer-Miiller process K such that the empirical process and K are close together with 
respect to the supremum norm. Berkes and Philipp [TT] extended this to dependent 
random variables. For sample quantiles, Csorgo and Revesz |T3] established a strong 
invariance principle, but only under independence. We will extend this to dependent 
data and to [/-quantiles. 

A strong invariance principle is a very interesting asymptotic theorem, as the limit 
behaviour of Gaussian processes is well understood and it is then possible to conclude 
that the approximated process has the same asymptotic properties. Note that a Kiefer- 
Miiller processes can be described as a functional Brownian motion, as its increments in 
s direction are independent Brownian Bridges. We have the following scaling behaviour: 
(■^K(t, ns)) Sjte [ 0) i] has the same distribution as (K(t,s)) St te[o,i]- 

Furthermore, a functional LIL holds: The sequence 



is almost surely relatively compact (with respect to the supremum norm). The limit set 
is the unit ball of the reproducing kernel Hilbert space associated with the covariance 
function of the process (K(t, s)) s ,te[o,i]- For details about the reproducing kernel Hilbert 
space, see Aronszajn |Sj or Lai |27| . 

2 Main Results 
Empirical ^/-Process 

The asymptotic theory for the empirical [/-process makes use of the Hoeffding decom- 
position, recall that h\(x,t) := E [h(x,Y,t)] — U(t). Under Assumptions [JJ [2] and HJ 
the following covariance function converges absolutely and is continuous (compare to 
Theorem 5 of Borovkova et al. [12J): 





r(M')=4Cov[/H {X 1 ,t),h 1 (X^t')} 



oo 



oo 



+ 4 Gov [h (X u t) , h {X k+1 , t')} + 4 Gov [h x (X k+1 , t) , h, (X u t')} . 



k=l k=l 
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2 Main Results 



Theorem 1. Under the assumptions^ [H and[^] there exists a centered Gaussian process 
{K{t, s))t,seR (after enlarging the probability space if necessary) with covariance function 

EK(t, s)K(t', s') = min {s, s'} T(t, t') 

such that almost surely 

sup -=\lns\{U lnsi {t) - U{t)) - K{t,ns)\ = OQag-*™ n). 
teR V n 
se[o,i] 

The rate of convergence to zero in this theorem is very slow, but the same as in Berkes 
and Philipp as we strongly use their method of proof. By the scaling property of 
the process K, we obtain the asymptotic distribution of U\n S j(t), and by Theorem 2.3 
of Arcones |3] a functional LIL: 



Corollary 1. Under the assumptions^ [H and[|] the empirical U -process 

LH {U lns] {t)-U{t))) 

' ieR,se[o,i] 



converges weakly in the space D(R x [0,1]) (equipped with the supremum norm) to a 
centered Gaussian Process (K(t, s))t, s &~R introduced in Theorem^ The sequence 

LH --{U [ns} {t)-U{t)) 



V2nloglogn A eR , se[ o,i]/ nm 

is almost surely relatively compact in the space D(Ex [0, 1]) (equipped with the supremum 
norm) and the limit set is the unit ball of the reproducing kernel Hilbert space associated 
with the covariance function of the process K . 

The first part of this corollary is very similar to Theorem 9 of Borovkova et al. |12] 
(they use a continuity condition that is different from our Assumption [2]). Up to our 
knowledge, part 2 is the first functional LIL for empirical [/-processes under dependence. 

Generalized Bahadur Representation 

Recall that the remainder term in the generalized Bahadur representation is defined as 

p-U n (t p ) 



Rn{p) = U- 1 ( P )-t p 



U {t v 



and that we write t p := U We set U 1 {p) := as it is not possible to find a 

generalized inverse of Uq = 0. 
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Theorem 2. Under the Assumptions^ [3] and [7] 

sup -^L\R\ ns \(p)\ = o(n~»\ogn) 
pel V n 
*e[o,i] 

almost surely with I = [C^Cy, where U(C\) < C\ < C2 < U{C2), 7 := (if the first 
part of Assumption [7] holds) respectively 7 := ^| (z/ i/ie second part of Assumption [7] 

Note that for a fast decay of the mixing coefficients, the rate becomes close to n~£, 
while the optimal rate for sample quantiles of independent data is n~ 4 (log n) 5 (log log n) * . 

Empirical [/-Quantiles and GL-Statistics 

Using the Bahadur representation, we can deduce the asymptotic behaviour of the em- 
pirical LZ-quantile process from Theorem [TJ 

Theorem 3. Under the Assumptions^ 0, [3] and 01 there exists a centered Gaussian 
process (K'(p, s)) pe / s6 R (after enlarging the probability space if necessary), where I is 
the interval introduced in Theorem [H, with covariance function 

EK'{p, s)K\p', s') = min {s, s'} 1 T(t p , t p ,) 

u(t p )u(t p >) 



such that 

1 

sup —= 

s£[0,l] 



[ns\ (C/^j (p) - t p ) - if 0, ns) = 0(log 38« n ) 



X' is a Gaussian process with independent increments in s direction, so we have the 
following consequences: 

Corollary 2. Under the Assumptions^ [5j and [7] 



pe/,se[o,i] 

converges weakly in the space D(I x [0, 1]) (equipped with the supremum norm) to the 
centered Gaussian Process (K'(p, s)) p e.f>eR introduced in Theorem^ The sequence 

,o L r J i (^h(p)-^i , 

is almost surely relatively compact in the space D(I x [0, 1]) (equipped with the supremum 
norm) and the limit set is the unit ball of the reproducing kernel Hilbert space associated 
with the covariance function of the process K'. 



9 



3 Preliminary Results 



As GL-statistics are linear functionals of the empirical [7-quantile process, we get an 
approximation for T n : 

Theorem 4. Let p\, . . . ,pd £ / and let J be a bounded function, that is continuous 
a.e. and vanishes outside of I . Under the assumptions[I\\^\^ and^ there exists (after 
enlarging the probability space if necessary) a Brownian motion B, such that for T n 
defined in Definition 11.61 and 

»C 2 r C2 



o 2 = [ 7 2 ^%J{p)J{q)d P d q 
Jc 1 Jcx u(t p )u(t q ) 



pj2 
(t Pj ) 



we have that 

sup —= | [ns\(T lns] - T(U~ X )) - oB(ns)\ = 0(log~3m n ) 

se[o,i] v n 

almost surely. 

By the well-known properties of Brownian motions, we have: 

Corollary 3. Letpi, . . . ,pa G / and let J be a bounded function. Under the assumptions 
d, El and^for T n defined in Definition \l.b\ 

^mn-rctr 1 )) 

converges weakly to the Brownian motion aB(s) with a 2 as in Theorem^ Furthermore, 
we have that the sequence 



v / 2nlog logn 



n6W 



is almost surely relatively compact in the space of bounded continuous functions C[0, 1] 
(equipped with the supremum norm) and the limit set is 

{/ : [0, 1] R|/(0) = 0, J f' 2 (s)ds < a 2 } . 

3 Preliminary Results 

Proposition 3.1. Under the assumptions^ [H and [|] there exists a centered Gaussian 
process (K(t, s))t, s eR (after enlarging the probability space if necessary) with covariance 
function 

EK(t, s)K(f, s') = min {s, s'} T(t, t') 
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3 Preliminary Results 



such that almost surely 
1 

sup —== 

se[o,i] 



(2 HXi,t) - K{t,ns)\ 

\ \<i<ns J 



0(\og 3840 n) 



Proof. This proposition is basically Theorem 1 of Berkes and Philipp which we 
have to generalize in three aspects: 

1. Berkes and Philipp assume that the covariance kernel V is positive definite, we 
want to avoid this condition here. 

2. Berkes and Philipp consider indicator functions ti x <t\, while in this version of the 
proposition, we deal with more general functions Eh(x,Y,t). 

3. Theorem 1 of Berkes and Philipp is restricted to the distribution function F(t) = 
Et{Xi<t} — t, we will extend this to a function U according to our Assumption [TJ 

The mixing condition of Berkes and Philipp is the same as our Assumption HI 

1. In the proof of their Theorem 1, Berkes and Philipp use the fact that T is pos- 
itive definite only for two steps. Their Proposition 4.1 (page 124) also holds if 
this is not the case. It is easy to see that the characteristic functions of the 
finite dimensional distributions then might converge to 1 at some points, but 
with the required rate. Furthermore, we have to show (page 135) that for all 
h,...,t dk E [0,1], P[\\(K(t u l), . . . , K(t dk ,l))\\ > \T k ] < 4, where T k and 5 k are 
defined in their article. Let Td k = (r(tj, ^)) 1<i j<d fe be the covariance matrix of 

K(ti, 1), . . . , K(td k , 1) and p its biggest eigenvalue. We first consider the case that 

i 

p > 0. As is symmetric and positive semidefinite, there exist a matrix such 

that (^ dk j rj fc = Td k and the vector K(tx, 1), . . . , K(td k , 1) has the same distri- 
i 

bution as (Wi, . . . , Wd k Y, where Wi, . . . , Wd k are independent standard normal 
random variables. So it follows that 

p[\\(K( tl , i), . . . , K(t dk ))\\ > -T k ] = P[||r|(w 1; . . . , w dk )\\ > \r k ] 

<P[^-p\\{W ll ...,W dk ) t \\>-T k } 



(2*0 



/ exp(~(xl + . . . + xl ))dx x ...dx dh . 

J\\(x 1 ,...,x dh )\\> T ± r T k 



The rest of the proof is then exactly the same as in Berkes and Philipp [TT]. In the 
case p = 0, we have that T = 0, so trivially P[||(K(ti, 1), . . .,K(t dk ))\\ > \T k ] = 
< S k . 
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3 Preliminary Results 



2. The proof uses different properties of the indicator functions. If the process 
(X n ) ng ]N is near epoch dependent with constants (a n ) n£ fj, then as a consequence 
of Lemma 3.2.1 of Philipp [30J the process (l{x n <t}) ngM is near epoch dependent 



with constants (y / a^) n eiN- The same holds for the sequence (hi(X n ,t)) ne ^ by 
Assumption [2j Lemma 3.5 and 3.10 of Wendler |34j . 

Furthermore, h and U are non- decreasing in t. Berkes and Philipp used differ- 
ent moment properties, which we also assume: hi(X n ,t) is bounded by 1 and 
E\hi(X n ,t) — hi(X n ,t')\ < C\t — t'\ for t,t' e R, so consequently for m > 1 
||/ti(X n , t)\\ m < 1 and \\hi(X n ,t) — hi(X n ,t')\\ m < \t — t'\~. So this more general 
version can be proved along the lines of the proof in Berkes and Philipp |llj . 

If U(t) = t does not hold, note that Ehi(Xi,t p ) = U(t p ) = p with t p = U~ 1 (p) := 
inf{t G H\U(t) > p}, because U is continuous. Clearly, Assumption [T] and [2] 
hold for h{x^y,U~ 1 {p)). Furthermore, notice that if U(t) = U(s), we have that 
h\(Xi, t) = h\(Xi, s) almost surely by monotonicity of h, so 



^2h 1 (X il t) = ^h 1 (X i ,t 



i=l 



i=l 



almost surely. From the first two parts of the proof, we know that there is a 
centered Gaussian process K* with covariance function 



E[K*(p, s)K*{p\ s')\ = min {s, s'} T(t p , t p 



with 



sup 

P e[o,i] 
se[o,i] 



2 h i(Xi,t p ) - K*(p,ns) 

Ki<ns 



almost surely. The Gaussian process K with K(t, s) 
covariance function and 



0(l0g 3840 jt,). 



K*(U(t), s) has the required 



1 

sup —j= 

»e[o,i] 



(2 J2 h 1 (X u t)-K(t,ns)) 

V l<i<ns / 

(2 HXi,t 

\ l<i<ns 



sup —= 

ten V n 

86 [0,1] 



uit) )-K*(U(t),ns) 



0(l0g 3840 77). 



□ 



Lemma 3.2. Let C^^C^^L be positive constants. Under Assumption^ part 1 (strong 
mixing), there exists a constant C, such that for all measurable, non-negative functions 
g : R — > R that are bounded by C 3 with E \g (Xi) — Eg (Xi)\ > C i n~'^+ I and satisfy the 
variation condition with constant L, and alln £ IN we have 



E[J29(X t )-E[g (X,)] < Cn 2 (log nf (E \g (X,) | ) 



1+7 



i=i 
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where 7 is defined in Theorem^ The same statement holds under Assumption^ part 
2 (near epoch dependence on absolutely regular sequence) for functions g : R — > R with 

E\g(X 1 )-Eg(X 1 )\>C 4 n-^. 

This is Lemma 3.4 respectively 3.6 of Wendler |34] . 

Lemma 3.3. Under Assumptions^ [H and^ there exists a constant C, such that for 
allt G R and alined 

n 

\E[h 2 {X n ,X n ,t)h 2 {X l2 ,X n ,t)}\ < Cn 2 . 

iiji.ia J2=l 

This is Lemma 4.4 of Dehling and Wendler |18| . 
Lemma 3.4. Under the Assumptions^ [H and[|] 



sup 



h 2 (Xi,Xj,t) = o(n» s 

l<i<j<n 

almost surely with 7 as in Theorem 

In all our proofs, C denotes a constant and may have different values from line to line. 

Proof. Without loss of generality, we can assume that U(t) = t, otherwise we use the 
same transformation as in the proof of Proposition 13.11 and study the kernel function 
h(x, y, U-\p)). We define Q n (t) := Ei<«i<„ h 2 PQ, t). For I G IN, let k = h = 2^ 

5 1 5 

and t r< i — ^- for r = 0, . . . , ki, so that Cn» < t r j — £ r _y — -t < Cm for all n G IN with 
2 l ~ 1 < n < 2 l and some constants C, C . By Assumption [TJ h and U are non-decreasing 
in t, so we have for any t G [t r -i,i, t r ,i], n < 2 l 



\Qn(t)\ 



{h{X, n X v t) - h 1 {X i ,t) - hiX^t)) - U{t)) 

l<i<j<n 



< max 



(h(X t ,X v t rtl ) - h x (X h t) - h x {Xj,t) - U{t)) 

l<i<j<n 



(hiX^X^U-u) - h x {X h t) - hiX^t) - U{t)) 

l<i<j<n 

< max {\Q n (t ri i)\, \Q n (t r -i t i)\} 



+ (n — 1) max 



J2(hi(Xi,t r> i) - h^t))) 
n(n — 1 



i=l 



Y^{hi{X i ,t)-h 1 {X i: t r ^ l ))) 



i=l 



+ 



\U{t ril )-U{t r ^ ;l )\ 



+ (n-l) 



< max {\Q n (t ri i)\, \Q n (t r -l,i)\} 

n(n — 1) 



1=1 



+ 2- 



\U{t r ,i) - U(t r - lt i)\. 
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So we have that 



sup \Q n {t)\ 

teR 



< max \Q n {t r ,i) \ + max (n — 1) 

r=0,...,k ' r=0,...,k 



yj (hi(Xi, t r j) — hi(Xi, 1,0) 



8=1 



+ max n{n-l)\U(t Tj i) -U{t r ^ u )\. 

r=0,...,k 



We will treat these three summands separately. By the choice of ti, . . . , tu-li we have 
\U(t r j) — U(t r ^ij)\ = t r> i — tr-1,2 = jg-, so for the last summand and 2 l ~ 1 < n < 2 l we 

know that max r= o v ..,fc n(n — l)|[/(i rj /) — (t r _i^) | < Cn 2 ~^ = o (n^~i j . For the first 

summand, we obtain by similar arguments as the ones used by Wu [36] to prove his 
inequality (6) of his Proposition 1 or by Dehling and Wendler [18] to prove their line (5) 

E\ max max \Q n (t r ,) | 2 ] 

n=2 ; - 1 ,...,2 i -l r=0,...,k 



< 



r=0 



y max \Qi2<l-l(tr,l) — Q(i-l)2 d - 1 (^ri) 
z — ' i=l,...,2 i - d 



v d=l 



fc « 2 ; 



r=0 d=l i=l 



< ^ I E E ^ (Qi2 d - X (tr,l) - Q(i-l)2 d -! (tr,l)) ' 



^ E 1 E E I # , *) ^2 , X ja ,t)]| 

r=0 d=l h,jx,i2,32=l 

< Ckl 2 2 2 ^ < Cl 2 2^\ 

where we used Lemma I3T31 in the last line. With the Chebyshev inequality, it follows for 

every e > 



5> 



max max \Q n (t r i)\ > e2 1 ^ s) 

=2 i ~ 1 ,...,2 ! -l r=0,...,k ' 



oo ^ oo ^ 

^ E^^^[ n T x 2- m o ax J g " ( ^ )|2] ^ E^^ 22(2+|) ' < °°» 

Z=l ' ' J=l 



as 7 < 1, so by the Borel Cantelli lemma 



P 



max max \Q n (t r i)\ > e2 1 ^ i) i.o. 

=2 i - 1 ,...,2 i -l r=Q,...,fe ' 



(the meaning of the abbreviation i.o. is "infinitely often"). It remains to show the 
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convergence of the second summand: 



E I max max in — 1) 

=2 i - 1 ,...,2 ! -l r=l,...,k 



y^(hx(Xj, t r> i) — hi(Xi,t r -ij)) 



i=l 



< 2^ Ye{ max 

^— ' \ n=2 i " 1 ,...,2 ! -l 



r=l 



^(/ll(X i ,i P) j)-/ll(Xi,t r _l,|)) 



i=l 



< C2 6 '/ 2 H max \t rl - t r _ x < C7 2 2 (6 "§ 7)/ . 

r=l,...,fc ' 



where we used Corollary 1 of Moricz and Lemma 13.21 to obtain the last line. Remember 
that k = ki = ^2§M and that \t r j — t r -\ t i\ > We conclude that 



E p 



< 



max max (n — 1) 

n=2 i - 1 ,....2 i -l r=l,...,fc 



£ r> ;) — t r -i t i)) 



1=1 



E 

2=0 



C 



max max (n — 1) 



42'(6-i) I „=2 i - 1 ,... ! 2 i -l r=l,...,fc 



i=l 



oo 



./2 2 (6-|7)/ = y^ <00 . 



2=0 Z=0 



The Borel Cantelli lemma completes the proof. 



□ 



Lemma 3.5. Let F be a non- decreasing function, c, I > constants and [C^Cy C R. 
If for all t, t' G [Ci, C 2 ] with \t — t'\ < I + 2c 

\F(t)-F(tf)-(t-t')\<c, 

then for all p,p' G R with \p-p'\<l and F~ l (p) , F~ l (p') G (Ci + 2c + /, C 2 - 2c - I) 

\F-\p)-F-\p')-(p-p')\<c 

where F~ 1 (p) := inf > p} is the generalized inverse. 

Proof. Without loss of generality we assume that p < p' . Let e G (0, c). By our 
assumptions 

F (F-\p) + (p — p) + c + e) > F (F _1 (p) + e) + (p' - p) + c - c 

> p + Qo' — p) = p'. 

By the definition of F^ 1 , it follows that 

F~\p') = inf > p'} < F~ 1 (p) + (p' -p) + c + e. 
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So taking the limit e — > 0, we obtain 

F-\p') < F-\p) + (p' -p) + c. 

On the other hand 
F (F-\p) + {p' -p)-c-e) <F (F-\p) - e) + (p 1 - p) - c + 



c 

<p+(p'—p) = p' . 



So we have that 

F~\p') > F-\p) + (p'-p)-c-e, 

and hence F~ 1 (p') > F~ x (p) + (p' — p) — c. Combining the upper and lower inequality 
for F~ 1 (p'), we conclude that — F^ 1 (p') — (p — p')\ < c. □ 

Lemma 3.6. Under the Assumptions^ [H, and^for any constand C > 
sup \U n {t) - U n {t') - u{t){t -t')\ = o(n~^~i logw). 

t,t'e[Cx ,C 2 ): 

\t-*\<Cy/&&2 

Proof. As a consequence of Assumption |3] and 7 < 1 

sup \U(t) - U{t') - u{t){t -t')\ = o(n"^~8 logn), 

t,t / £[Ci ,C 2 ]: 

[ t _f|<Cy/lS« 

so it suffices to show that 

sup \U n (t) - U n (t') - (U(t) - U(t'))\ = o(n-5-i logn). 
t,t'e[c 1 c 2 }: 
\t-t'\<Cy/ losl ° sn 



Without loss of generality, we can assume that U(t) = t, otherwise we use the same trans- 
formation as in the proof of Proposition 13 . 1 l and study the kernel function h(x, y, U~ 1 {p)). 
Note that in this case, we can consider the supremum over [0, 1]. Furthermore, we will 
consider only the case C — 1, we will prove 

K n := sup \U n (t) - U n (t') — (* — 01 = o(n-^-i logn). 
t,t'e[ o,i}-. 

For I 6 IN, let k = h = C2^ l - log] ° sl ^ , so that for all n = 2 l ~\ ... ,2 l - 1, we have 



that J 1 ^^ <i t < Cy ^Jf^. We define for r = 0, . . . , fc, the real numbers t r>l := 
Clearly 



K n < 2 max sup |Z7 n (t) - [/„(£') - (t - t') 

r=1 >-»*t,f6[* r -i, l ,t r ,i] 



< 4 max sup |£/ n (£) - £7„(£ r -i,0 - (t - £,—1,01 • 
r = 1 -" fc te[t r _i,,,t r ,i] 
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Now chose m = mi G IN such that rriiki ~ So for all n = 2 l 1 , . . . , 2 l — 1 and some 

constants C, C", we have that Cn~2~a < T — < C'n" - ^ . We define for r = 1, .... ki 

— kirrii — iii 

and r* = 0, . . . ,rrii the real numbers t*+ rl = t r ^ + As £/" n and [/ are non-decreasing, 
we have for t G (£**_i, r , z , £* V)Z ) 

|cr„(t) -?/„(*,._!,,) -(t-t r _i,i) I 

< max { | U n (t* r , r l ) - U n (t r -i,i) ~{t- t r -i,i) | , 

| U n (t**_ 1>rjl ) — U n (t r _ 1: i) — (t — tr-ij) | } 

< max{\U n (t^ trt i) - U n (t r -ui) - (** Vii - tr— | , 

\U n (t**-l,r,l) ~ U n (t r ~l,l) — (£**-l,r,J — *r-l,i) | } + IV,r,Z — V*-l,r,zl> 

and consequently 



K n < 4 max max I [/„(£** z ) 

r=l,...,fc r*=l,...,m ' ' 



U n {t r -\,l) — (£** r l — tf-ijj 



+ 4 max max \t** rl — t**_ 1 r t \ 

r=l kr*=l,...,m ' ' 



< 8 max max 

r=l,...,k r*=l,...,m 



+ 4 max max 

r=l,...,k r*=l,...,m 



n(n — 1) 



n ^— ' ' ' n ^— ' 

l<i<n l<j<n 
\l<i<j<n l<i<j<«. / 



l<i<j<r 

+ 4 max max ri — £**_ lrJ |. 

r=l,...,fc r* = l,...,m ' ' 



By our construction of the numbers £** r z , we have that £*, r ; — r ; = and obtain 



for all n = 2 l ~\ ...,2 l -I 



max max \t** rl - t*^_ 1 rl \ < sup u(t)2 ^ ^ l 

r=l,.,fcr*=l,...,m ' ' ' ' te[C u C 2 ] 



< Cn 2 I = o(n 2 s loern). 



With the help of Lemma 13. 4} it follows that 



max max 

-=!,.. ,,k r*=l,...,m 



n{n — 1) 



I ^ ^(Xi,Xj,t** trtl ) - h 2 (X i ,Xj,t r -i ) i) 1 

\l<i<j<n l<i<j<n / 



< 



— —sup 

n(n - 1) i S R 



l<i<j<n 



_I_2 
O I 72, 2 » 



Furthermore, we have for the linear part by Lemma 13.21 and Corollary 1 of Moricz 
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(which gives moment bounds for the maximum other multidimensional partial sums) 

n n 

/] hi(Xi, t**_ lrl ) - 2J h^Xi, t r -x t i) 



E 



max max max 

=2 l ~ 1 ,...,2 l — l r=l,...,k r*=l,...,m 



i=l 



i=l 



r=l 



max max 

=2 i_1 ... 2 l — 1 7711=1,— ,m 



71 7711 



i=l r*=l 



1+7 



as E\h\{Xi, t) — hx(Xi, t')\ < \t — t'\ and by our construction t% nrl — t^ rl — t r+ x t i — t r j 
Y — C V ^2^ • S° we can conclude that for any e > 



z=i 



max max max 



^ (/Hp^**-i,r,>.) - hiX^t^)) 



7 = 1 



v ' 2( 2 -i) / / 2 (log/) 
e4/ 4 2 (2-i); 



< 



1=1 



^ (log pa ^ 

c z^— ^2— <0 °- 



With the Borel Cantelli lemma, it follows that 



max max 

r=l,...,k r*=l,...,m 



h>i(Xi, t** rl ) — hi(Xi,t r 



-hi 



Ki<n 



Ki<n 



1=1 



1 7 

0(77,2 ~s loen) 



almost surely and finally 



max max 

r=l k r* = l,...,m 



— hl(Xi,t*+ rl ) hx(Xi,t r -ll 

n z — ' ' ' n z — ' 



Ki<n 



Ki<n 



1 1 

o(n~2 _ 8 logn). 



□ 



4 Proof of Main Results 

In all our proofs, C denotes a constant and may have different values from line to line. 
Proof of Theorem d We use the Hoeffding decomposition 

U n {t) = U{t) + -Y j h 1 {X u t)+ 2 V h 2 {X h X h t). 
n^ 1 ram— 1) ^ 

i=l v ' l<i<j<n 
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Let K be a Gaussian process as in Proposition 13.11 Then 



sup -= | [ns\ (U [ns] (t) - U(t)) - K(t, ns) \ 

s6 [0,1] 



1 

< sup 

ten y/ns 
*e[o,i] 



(2 £ ^(Xi,*)-^*,^)) 

\ l<i<ns / 



+ sup — — 

s6[0,l] 



l<i<j<ns 

= O(log~^o n) 



as by Lemma 13.4^ we have 



sup — 

s6[0,l] 



l<i<j<ns 



< n s sup 



teR (nj2 8 

ri'=l,...n 



i<*<j<"' 



0(n" 



□ 



Proof of Theorem To simplify the notation, we will without loss of generality assume 
that = P = t p on the interval 4. In the general case, one has to change the 

function y, t) to h(x, y, t/" 1 ^)), as Y, £/ -1 (p)) = ^(^~ 1 (p)) = P- The related 

empirical {/-process U n o J7 -1 , we have 



R n {p) = U- x {p)-U- x {p) - 
1 



u(t p ) 



u(tp 



((u n o t/- 1 )- 1 ^ - P -( P -u n o u-\ P ))) + (([/- 1 (p) - u-\ P )Y), 



so Assumption |3] guarantees that R n (p) is only blown up by a constant because of this 
transformation. If U(p) = p = t p , then we can write R n (p) as 



Rn(p) = U n L (p) -t p + U n (t p ) - p 



{Un\p) - u-\u n {t p )) + U n (t p )-p) + (u-\u n {t p )) - t p ) 



Applying Lemma I3T61 and Lemma T3.5I with F = U n , c = n 2 I logra and I = C \j log ^ og n , 
we obtain 



sup 

p,p'el: 

\p~p'\<Cy/^^ 



\U n 1 (p)-U n 1 (p')-(p-p)\ = o(n 2 llogn). 
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almost surely. By Corollary [T] we have that sup te [ Cl)C2 ] (U n (t p ) — p) < Cy og ° g " almost 
surely, it follows that 

sup I^Cp) - u-\u n {t p )) + U n (t p )-p\ 

pel 

< sup \U~ l (j)) — U~ l {p) — (p — p')\ = o(n~2~i logn) 

p,p'el: 

almost surely. It remains to show the convergence of U~ 1 (U n (t p )) — t p . For every 

1 *y 

e > by the definition of the generalized inverse, U~ (U n (t p )) — t p > en~2~s\ogn 
only if U n (t p + en~^~s logn) < U n (t p ) and U~ l {U n {t p )) — t p < — en"5~i logn only if 

1 "Y 

U n {t p — en~2~8 logn) > U n (t p ). So we can conclude that 



P 



sup \U~ (U n (tp)) — t p \ > en~2~s logn i.o. 



< P 



sup 

tG[Ci,C 2 -en~^~ : 5 logn] 



U n (t + en~2~8 logn) - U n (t) < i.o. 



< P 



sup 

t,fe[Ci,c 2 ] 

_1_ 

}t-t f \=en "2 Flogn 



\U n {t) - U n {t') - (U(t) - U(t'))\ > \U(t) - U(t')\ i.o. 



< p 



sup |l7 n (t) - U n (t') - (U(t) - U(t'))\ > 1 + I e ]° gn - 

t,t'e[Ci,c 2 ] n2 + s ml t(E[Cl>C2] u (t) 

Jt-t'|<en~3~i logn 



I.O. 



0, 



where the last line is a consequence of Lemma E2U We have proved that sup pg7 - \R n (p)\ 

1 7 

o(n~2~8 logn), and can finally conclude that 



sup -^=\R [nsi {p)\ 



logn p6 / y/n 

se[o,i] 



n 2^8 



N i-_2logn n 2^8 
n'<v^: n logn logn pe / ^<n'<n logn pe / 



n 2^8 



< Cn 5 + i6 S up sup |i? n /(p)| + sup sup \R n i{p)\ — > 0. 

n'elN pel n'>*Jn~ log n' pe / 



□ 
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Proof of Theorem^ Define K'(p, s) := — ^r^K(t p , s), there K is the Gaussian process 
introduced in Theorem [TJ K' is then a Gaussian process with covariance function 



EK'{p, s)K'(p', s') = min {s, s'} 



u(t p )u(tpt) 



r(tp, tpi 



and by Theorem [T] and Theorem |2] 
1 



sup —= 

pel vn 

s6[0,l] 



[ns\(U^ si (p)-t p )-K'(p,ns) 



< sup —= 

P ei V n 

s6[0,l] 

1 1 

+ sup 

pe / y/nu{t p 
se[o,i] 



- | [nsj (?7 Lns j (t p ) - p) - K(t p , ns) 



[ns\ 



< sup — ^|i2 LfM j(p)| + 7 



1 



pel V n 
se[o,i] 



sup 



inf pe iu(t p ) plE i yjn 
*e[o,i] 



LnsJ([/ LnsJ (tp) -p) - A"(t p ,ns)| 

= O (log - n ) 



□ 



almost surely. 

Proof of Theorem^ If a 2 > 0, set 

B( S ) = ir(X'(. J s))= / J(p)K'( P ,s)dp + ^bjUnipj). 

In the case a 2 = 0, B may be an arbitrary Brownian motion. As J is a bounded function 
T is a linear and Lipschitz continuous functional (with respect to the supremum norm) 

so 



sup 

se[o,i] ^ /r> 



[ns\{T{U^)-T(U- x ))-aB(ns) 

1 T[[ns\(U^ si -U-')-K'(., 
1 



sup 

se[o,i] 



< C sup 
se[o,i] 



[ns\ (U^ si (p) - t p ) - K'(p, ns) = 0(log »« 



n 



It remains to show that B is a Brownian motion. Clearly, EB(s) = for every s > 0. 
By the linearity of T, B is a Gaussian process with stationary independent increments. 
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■>C 2 e Ci 



□ 
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